24-script

DSAN 6750 / PPOL 6805: GIS for Spatial Data Science

Author
Affiliation

Christy Hsu

Georgetown University

Other Formats

Intro

This project attempts to provide insights into EV adoption patterns using approaches from spatial data science.

Seattle-Tacoma-Bellevue, WA Portland-Vancouver-Hillsboro, OR-WA Spokane-Spokane Valley, WA

Specifically, we want to understand how urbanicity plays in shaping the observed landscape of EV population in state of Washington.

Literature Review

Methodology

Set up

packages

loading files

Reading layer `ev1203' from data source 
  `/Users/toyuan/24-manuscript/data/ev1203.gpkg' using driver `GPKG'
Simple feature collection with 208002 features and 25 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -124.6274 ymin: 45.596 xmax: -117.0455 ymax: 48.99205
Geodetic CRS:  WGS 84
Reading layer `wa_poly' from data source 
  `/Users/toyuan/24-manuscript/data/wa_poly.gpkg' using driver `GPKG'
Simple feature collection with 1 feature and 14 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -124.849 ymin: 45.54354 xmax: -116.9161 ymax: 49.00244
Geodetic CRS:  NAD83
Reading layer `urban_shape' from data source 
  `/Users/toyuan/24-manuscript/data/urban_shape.gpkg' using driver `GPKG'
Simple feature collection with 56 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.1802 ymin: 45.78668 xmax: -117.0418 ymax: 49.0021
Geodetic CRS:  WGS 84
Reading layer `stations' from data source 
  `/Users/toyuan/24-manuscript/data/stations.gpkg' using driver `GPKG'
Simple feature collection with 2800 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -124.6629 ymin: 25.91606 xmax: -73.98274 ymax: 48.99526
Geodetic CRS:  WGS 84
Reading layer `ct1202' from data source 
  `/Users/toyuan/24-manuscript/data/ct1202.gpkg' using driver `GPKG'
Simple feature collection with 1784 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.849 ymin: 45.54354 xmax: -116.9161 ymax: 49.00244
Geodetic CRS:  WGS 84
Reading layer `ev1202-3' from data source 
  `/Users/toyuan/24-manuscript/data/ev1202-3.gpkg' using driver `GPKG'
Simple feature collection with 208002 features and 24 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -124.6274 ymin: 45.596 xmax: -117.0455 ymax: 48.99205
Geodetic CRS:  WGS 84
Reading layer `zcta-tl' from data source 
  `/Users/toyuan/24-manuscript/data/zcta-tl.gpkg' using driver `GPKG'
Simple feature collection with 605 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.7365 ymin: 45.54354 xmax: -116.9161 ymax: 49.00244
Geodetic CRS:  WGS 84
Reading layer `nov-ev' from data source 
  `/Users/toyuan/24-manuscript/data/nov-ev.gpkg' using driver `GPKG'
Simple feature collection with 17025 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -124.6274 ymin: 30.41628 xmax: -71.06739 ymax: 48.99976
Geodetic CRS:  WGS 84
Source: Article Notebook

function sf_to_ppp

function relative_int

ev_counts

Initial Evidence

Visualization

Before presenting our evidence of clustering in the spatial pattern of EVs, we would like to start with some clarification about the “points” and their locations in our data. Each observation in the dataset represents an EV with its unique DOL id. However, the longitude and latitude information corresponds to the centroid of that Zip Code Tabulation Area associated with the recorded address of the EV owner. Thus, 208,002 distinct EVs are matched to only 548 distinct points.

From the frequency table of counties, we can see that EV population exists across all counties in WA. When plotting EV count per county as a statistic, a skewed distribution gives us the first evidence of the variance in EV adoption between counties.

A map visualization of the EV locations and counts provides us a spatially clustering impression. Furthermore, when looking at the map we tended to suggest seeing clusters and identify them with the three of the main metropolitan areas. For example, Seattle-Tacoma-Bellevue Metropolitan area to the north west clusters; one mid-western cluster to Spokane-Spokane Valley Metropolitan area; one approximately in the county Clark with Vancouver city to take part of the cross states Portland-Vancouver-Hillsboro Metropolitan area.

urb_map <- ggplot() +
  geom_sf(data = ct_sf, fill = "lightgray", color = "white", size = 0.2) + 
  geom_sf(data = ev_sf, aes(size = count), color = 'cornflowerblue', alpha = 0.5) +
  geom_sf(data = urban_poly, aes(fill = "Urban Areas"), color = "pink", alpha = 0.3) +
  scale_size_continuous(name = "EV Registrations", range = c(1, 12)) +
  scale_fill_manual(name = "Legend", values = c("Urban Areas" = "pink")) +
  labs(
    title = "Distribution of EV registrations in WA",
    subtitle = "urban areas"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 0.5, size = 12)
  )

ggsave("image/manu-urb-ev.png", plot = urb_map)
urb_map

Using the definition provided by US Census Bureau, we can assign urban or rural labels to each registration. We learned from this classification that 180,490 registrations took place in urban areas, leaving 27,512 in rural areas. This spatial unevenness is evident when we impose the polygons of urban areas onto the EV population distribution.

Spatial Autocorrelation

In order to quantitatively measure the overall clusterness of our data and, regarding that clustering pattern is defined by having positive spatial autocorrelation, we go for the calculation the Global Moran’s I statistic.

global moran’s I

To estimate the autocorrelation value for our observations from Moran’s I formula, we have to specify our statistic of interest, a neighbor definition, and prepare a neighborhood relation and assign weights to neighbors.

We tried two ways to use spdep::moran for calculating Moran’s I. The first step here is to aggregate observed points into polygons, ZIP Code Tabulation Areas that contains 605 polygons and Census Tracts of 1784 polygons are two target of aggregation.1 When constructing the neighborhood relation both Rook and Queen defintions of neighbor are tried, and we use the default to assign equal weight to all neighbors. Both attempts are to point us to a postive autocorrelation of our observations, suggesting spatial clustering. To make sense of how significant our observed Moran’s I statistic is, we further did a permutation test using spdep::moran.mc() and acquired a pseudo p-value of 0.001 the most extreme value we can get in the case of 999 Monte Carlo Simulations, that is, the likelihood that our data generated from Complete Spatial Random process is a low one.

calculate the Moran’s I statistic sensitive to irregularly distributed polygons2

Moran’s I using ZCTAs as bounding polygons
[1] 0.6457255

a positive autocorrelation, significant?


    Monte-Carlo simulation of Moran I

data:  zcta_sf$n_ev 
weights: zcta_listw  
number of simulations + 1: 1000 

statistic = 0.64146, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Moran’s I using Census Tracts as bounding polygons

Rook definition of neighbor

[1] 208002

Moran’s I value

[1] 0.5136527
visualize
  • create sf containing the points corresponding to the centroids of the census tracts

    Monte-Carlo simulation of Moran I

data:  ct_sf$n_ev 
weights: ct_listw  
number of simulations + 1: 1000 

statistic = 0.51365, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

LISA: Local Moran’s I

With evidence of clustering globally, we also want information about where we can find the clusters. In this case, spdep::localmoran can help use with a Moran’s I statistic for each census tract. if a region i has a positive deviation from the mean and its neighbor also have high values to the values is going to be high. In our case, we are comparing the EV counts of the census tract i to the mean count and how i covariate with its neighbors.

[1] "localmoran" "matrix"     "array"     
[1] 1784    5
[1] 1784   26
ev_ct_local <- ct_sf7 |> ggplot() +
  geom_sf(aes(fill= as.numeric(local_i))) +
  labs(
    title = "Local moran's I by Census Tracts",
    fill = "local_i"
  ) +
  scale_fill_gradient2(low="darkblue", high="red",
  transform = "pseudo_log")
# +
  scale_fill_viridis_c(na.value = "gray")
# ggsave('image/ct_localmoran.png', ev_ct_local)
ev_ct_local
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.28334  0.03044  0.19644  0.51365  0.46033 26.79675 

finding interesting locations

ct_sf7$local_i <- ct_sf7$local_i |> as.numeric()
mapview(ct_sf7, zcol='local_i', label = "fips")

Hypothesis Testing (Regression)

To understand the first-order properties of EV adoption as a point pattern, we can look at our EV population distribution together with the human population distribution in WA. They appeared to follow very similar distribution, and we do recognize that the core of the definition of an urban area is the spatial distribution of population. Thus, it will be helpful to set aside other characteristics we associate with the urban label at the time, examine the relationship between the two populations in WA. We expect a better understanding of this relationship can guide us to extract the parts of EV adoption pattern, explainable by population density and unexplainable that how the effectiveness of explaining EV adoption in terms of population density migh varies from one urban area to another.

distribution_map <- ggplot() +
  geom_sf(data = ct_sf, fill = "lightgray", color = "white", size = 0.2) + 
  geom_sf(
    data = ev_sf,
    aes(size = count,
    color = cb_palette[5]),
    alpha = 0.6) +
    scale_size_continuous(name = "EV Registrations", range = c(0.1, 15)) + 
    labs(
    title = "Distribution of ZEVs in WA",
    subtitle = "Cumulative data to October 31, 2024",
    x = "longitude",
    y = "latitude"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14),
    plot.subtitle = element_text(hjust = 0.5, size = 11)
  ) +
  guides(color = "none")

ggsave("image/manu-ev-distribution.png", plot = distribution_map)

distribution_map

NASA GPW data provides us a glimpse of disaggregated human population data, so that we can conveniently take it as points data of population. The data that this map use are requested from APPEEARS.3(International Earth Science Information Network (CIESIN) - Columbia University 2018)

which they have a value for human population count for each 1 km^2 grid

  • 30 arc-seconds spatial resolution

gpw_raster

class      : RasterLayer 
dimensions : 416, 955, 397280  (nrow, ncol, ncell)
resolution : 0.008333333, 0.008333333  (x, y)
extent     : -124.8667, -116.9083, 45.54167, 49.00833  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : gpw-popcount.tif 
names      : gpw.popcount 
gpw.popcount geometry
0.00000 POLYGON ((-123.1333 49.0083…
32.73196 POLYGON ((-123.1 49.00833, …
1004.85199 POLYGON ((-123.0917 49.0083…
1176.74744 POLYGON ((-123.0833 49.0083…
1027.05786 POLYGON ((-123.075 49.00833…
374.41202 POLYGON ((-123.0667 49.0083…
[1] 304514      2
gpw_rasterpopcount_sf |> ggplot(aes(fill = GPW_UN_Adj_PopCount.411_population.count_doy2020183_aid0001)) +
  geom_sf() +
  scale_fill_viridis_c() + 
  theme_minimal() +
  labs(title = 'Population Count (GPW)',
       fill = 'Population')
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   0.000    0.000    0.144   24.585    1.504 7425.167 

to eliminate the bounding lines of these 1 km^2 grids, set geom_sf color = NA

gpw_map <- popcount_sf |> ggplot(aes(fill = pop)) +
  geom_sf(color = scales::alpha("white",0)) +
  scale_fill_viridis_c(trans = 'pseudo_log') + 
  theme_minimal() +
  labs(
    title = "Population Counts in WA",
    subtitle = 'GPW V.4 2020-07-01',
    fill = "Population Count"
    ) + 
  theme(
    plot.title = element_text(hjust=0.5),
    plot.subtitle = element_text(hjust=0.5)
    )

ggsave('image/manu-gpw-raster.png', plot = gpw_map)

gpw_map
un_map <- popcount_sf |> ggplot(aes(fill = pop)) +
  geom_sf(color = NA) + 
  scale_fill_viridis(option = 'C', trans = 'pseudo_log') + 
  theme_minimal() +
  labs(
    title = 'Population Count',
    subtitle = 'GPW V.4 2020-07-01',
    fill = 'Population Count'
    )

ggsave('image/manu-gpw-0701.png', plot = un_map)
un_map

Our null hypothesis is that people in WA have equal chance to adopt EV regardless of where they live, rural or urban.

First order properties of point process

Intensity Function

One important limitation of this analysis that needs to be addressed is that we use the WA state boundaries as our observation window. The drawback of this choice is quite obvious: among the three major metropolitan areas in WA, two are situated at the borders of the state – one with Oregon and the other with Idaho. This can lead to underestimations, as they can have less neighbors as the result of this bounding box.4

Intensity function we are interested here is the count of event , count of EVs and count of people in an area

Constructing the ppp object for estimating EV population intensity is a more straightforward one, we use the 548 distinct points and marked them with EV registration counts. When it comes to finding an appropriate way to estimate human population intensity, since ZCTAs are not a typical statistical division of US Census Bureau, thus, one way to approach this problem is to use the data of population counts by census tract and use the centroids of the census tracts to construct ppp object. Another way we explored is to take the 605 centroids of ZCTAs and request points estimate of population counts also from NASA APPEEARs.

zev_ppp

how the kernel density function smoothed the marked ppp?

choose a bandwidth for intensity estimate

[1] 70971.96
[1] 1.490786e-07
pop_int
[1] 7705281

# png('image/manu-ct-ratio.png', width = 750, height = 500)
relative_int(zev_ppp, pop_ppp)
# dev.off()

With these ppps at hand spatstat::density() can handle these marked ppp and estimate us the intensity function accounting the marks as weights.

gpw_ppp

[1] 308385.8

# png('image/manu-zpts-ratio.png', width = 750, height = 500)
relative_int(zev_ppp, gpw_ppp)
# dev.off()

Relative Intensity Surface

spatstat gave us a bw optimal variation kernel smoothing of a disc point within circle

if we want to use the gpw data for simulation

gpw_pop <- gpw_sf$pop_count |> sum()
all_pop <- ct_sf$DP1_0001C |> sum()
gpw_to_all <- (all_pop / gpw_pop)
gpw_to_all
gpw_ppp$marks <- gpw_to_all * gpw_ppp$marks
gpw_ppp$marks |> sum()

Monte Carlo Simulation

That is, the spatial distribution of population alone can not offer a comprehensive explanation of the EV spatial pattern, can be decent but not overall satisfying

divide regions for quadrat count

computational approach: 999 simulations
set.seed(6805)
gen_sims_ppp <- function(num_sims = 999) {
  ev_sims <- spatstat.random::rpoint(
    n = nrow(ev_sf),
    f = pop_int,
    nsim = num_sims
    )
  return(ev_sims)
}
n_sims <- 999
ev_sims_list <- gen_sims_ppp()
calculate test statistic from observations
   Low Medium   High 
  1644   9001 197357 
calculate test statistic from simulations
sims_region_counts <- lapply(
  X = ev_sims_list,
  FUN = compute_quadrat_counts
)
sim_counts_df <- as_tibble(sims_region_counts) |> t() |> as_tibble()
colnames(sim_counts_df) <- region_labels
sim_counts_df |> head(4)
bind_df <- bind_rows(sim_counts_df, obs_counts)
bind_df |> dim()
bind_df |> write_csv('data/manu-simulations.csv')

With the computational approach and the pseudo p-value we can say that

Second order

calculate test statistic from observations
   Low Medium   High 
  1644   9001 197357 
calculate test statistic from simulations
sims_region_counts <- lapply(
  X = ev_sims_list,
  FUN = compute_quadrat_counts
)
sim_counts_df <- as_tibble(sims_region_counts) |> t() |> as_tibble()
colnames(sim_counts_df) <- region_labels
sim_counts_df |> head(4)
bind_df <- bind_rows(sim_counts_df, obs_counts)
bind_df |> dim()
bind_df |> write_csv('data/manu-simulations.csv')

[1] "im"

## Discussion

Conclusion

import requests

api_key = AFDCKEY

url = f'https://developer.nrel.gov/api/alt-fuel-stations/v1.csv?api_key={api_key}&fuel_type=ELEC&state=WA&access=public'
output_file = 'data/ev_charging_stations_wa.csv'
response = requests.get(url)
if response.status_code == 200:
    with open(output_file, 'wb') as f:
        f.write(response.content)
    print(f"{output_file}")
else:
    print(f"status code {response.status_code}: {response.text}")
 [1] "Fuel Type Code"                         
 [2] "Station Name"                           
 [3] "Street Address"                         
 [4] "Intersection Directions"                
 [5] "City"                                   
 [6] "State"                                  
 [7] "ZIP"                                    
 [8] "Plus4"                                  
 [9] "Station Phone"                          
[10] "Status Code"                            
[11] "Expected Date"                          
[12] "Groups With Access Code"                
[13] "Access Days Time"                       
[14] "Cards Accepted"                         
[15] "BD Blends"                              
[16] "NG Fill Type Code"                      
[17] "NG PSI"                                 
[18] "EV Level1 EVSE Num"                     
[19] "EV Level2 EVSE Num"                     
[20] "EV DC Fast Count"                       
[21] "EV Other Info"                          
[22] "EV Network"                             
[23] "EV Network Web"                         
[24] "Geocode Status"                         
[25] "Latitude"                               
[26] "Longitude"                              
[27] "Date Last Confirmed"                    
[28] "ID"                                     
[29] "Updated At"                             
[30] "Owner Type Code"                        
[31] "Federal Agency ID"                      
[32] "Federal Agency Name"                    
[33] "Open Date"                              
[34] "Hydrogen Status Link"                   
[35] "NG Vehicle Class"                       
[36] "LPG Primary"                            
[37] "E85 Blender Pump"                       
[38] "EV Connector Types"                     
[39] "Country"                                
[40] "Intersection Directions (French)"       
[41] "Access Days Time (French)"              
[42] "BD Blends (French)"                     
[43] "Groups With Access Code (French)"       
[44] "Hydrogen Is Retail"                     
[45] "Access Code"                            
[46] "Access Detail Code"                     
[47] "Federal Agency Code"                    
[48] "Facility Type"                          
[49] "CNG Dispenser Num"                      
[50] "CNG On-Site Renewable Source"           
[51] "CNG Total Compression Capacity"         
[52] "CNG Storage Capacity"                   
[53] "LNG On-Site Renewable Source"           
[54] "E85 Other Ethanol Blends"               
[55] "EV Pricing"                             
[56] "EV Pricing (French)"                    
[57] "LPG Nozzle Types"                       
[58] "Hydrogen Pressures"                     
[59] "Hydrogen Standards"                     
[60] "CNG Fill Type Code"                     
[61] "CNG PSI"                                
[62] "CNG Vehicle Class"                      
[63] "LNG Vehicle Class"                      
[64] "EV On-Site Renewable Source"            
[65] "Restricted Access"                      
[66] "RD Blends"                              
[67] "RD Blends (French)"                     
[68] "RD Blended with Biodiesel"              
[69] "RD Maximum Biodiesel Level"             
[70] "NPS Unit Name"                          
[71] "CNG Station Sells Renewable Natural Gas"
[72] "LNG Station Sells Renewable Natural Gas"
[73] "Maximum Vehicle Class"                  
[74] "EV Workplace Charging"                  
[75] "Funding Sources"                        
evse_sf |> sf::st_write('evse.gpkg')
[1] 2798    2
          geometry        count      
 POINT        :2647   Min.   :1.000  
 epsg:4326    :   0   1st Qu.:1.000  
 +proj=long...:   0   Median :1.000  
                      Mean   :1.006  
                      3rd Qu.:1.000  
                      Max.   :2.000  
filt_stations <- station_counts |> filter(count == 2)
filt_stations |> mapview()
urb_station_map <- ggplot() +
  geom_sf(data = ct_sf, fill = "lightgray", color = "white", size = 0.2) + 
  geom_sf(data = wa_stations_sf, color = cb_palette[3], alpha = 0.6) +
  geom_sf(data = to_4326(urban_poly), aes(fill = "Urban Areas"), color = "pink", alpha = 0.3) +
  scale_fill_manual(name = "Legend", values = c("Urban Areas" = "pink")) +
  labs(
    title = "Distribution of Public EV Charging Stations in WA",
    subtitle = "urban areas"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 0.5, size = 12)
  )

ggsave("image/manu-urb-stations.png", plot = urb_station_map)
urb_station_map
# 4326

station_ppp

[1] 2647


relative_int(station_ppp, zev_ppp)

positive and significant, clustering of like values

a computational approach.

   Low Medium   High 
    70    332   2245 
Low Medium High
67 250 2345
55 263 2344
61 274 2327
57 261 2344
[1] 1000    3

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
 0.000000  0.000000  0.000000  0.007889  0.000000 10.000000 

References

International Earth Science Information Network (CIESIN) - Columbia University, Center for. 2018. “Gridded Population of the World, Version 4 (GPWv4): Population Count Adjusted to Match 2015 Revision of UN WPP Country Totals, Revision 11.” Palisades, NY: NASA Socioeconomic Data; Applications Center (SEDAC). https://doi.org/10.7927/H4PN93PB.

Footnotes

  1. the reason that I did not simply use ZCTAs is because though the codebook provided describe the location variable as the center of postal code areas, but the longitude and latitude provided did not align well with the centroids of ZCTAs from tiger/line shapefile, with cases that more than one points fall within a ZCTA↩︎

  2. sherrie xie↩︎

  3. Center for International Earth Science Information Network - CIESIN - Columbia University (2018). Gridded Population of the World, Version 4 (GPWv4): Population Count Adjusted to Match 2015 Revision of UN WPP Country Totals, Revision 11. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). doi: 10.7927/H4PN93PB. Accessed December 4, 2024.↩︎

  4. these peripheral areas might be affected by artificial lines?↩︎